Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
Add more filters










Publication year range
1.
Nature ; 625(7996): 832-839, 2024 Jan.
Article in English | MEDLINE | ID: mdl-37956700

ABSTRACT

AlphaFold2 (ref. 1) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein's biological function often depends on multiple conformational substates2, and disease-causing point mutations often cause population changes within these substates3,4. We demonstrate that clustering a multiple-sequence alignment by sequence similarity enables AlphaFold2 to sample alternative states of known metamorphic proteins with high confidence. Using this method, named AF-Cluster, we investigated the evolutionary distribution of predicted structures for the metamorphic protein KaiB5 and found that predictions of both conformations were distributed in clusters across the KaiB family. We used nuclear magnetic resonance spectroscopy to confirm an AF-Cluster prediction: a cyanobacteria KaiB variant is stabilized in the opposite state compared with the more widely studied variant. To test AF-Cluster's sensitivity to point mutations, we designed and experimentally verified a set of three mutations predicted to flip KaiB from Rhodobacter sphaeroides from the ground to the fold-switched state. Finally, screening for alternative states in protein families without known fold switching identified a putative alternative state for the oxidoreductase Mpt53 in Mycobacterium tuberculosis. Further development of such bioinformatic methods in tandem with experiments will probably have a considerable impact on predicting protein energy landscapes, essential for illuminating biological function.


Subject(s)
Cluster Analysis , Machine Learning , Protein Conformation , Protein Folding , Proteins , Sequence Alignment , Mutation , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Rhodobacter sphaeroides , Bacterial Proteins/chemistry , Bacterial Proteins/metabolism
3.
Int J Ment Health Nurs ; 33(1): 202-212, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37788130

ABSTRACT

This article aims to draw attention to increasing genericism in nurse education in the United Kingdom, which sees less specialist mental health education for mental health nursing students and offers opposition to such direction. In 2018, the Nursing and Midwifery Council produced the 'Future Nurse' standards which directed changes to pre-registration nurse education. This led to dissatisfaction from many mental health nurses, specifically regarding reduced mental health content for students studying mental health nursing. Concerns have been raised through public forum and evolved into a grassroots national movement 'Mental Health Deserves Better' (#MHDeservesBetter). This is a position paper which presents the perspective of many mental health nurse academics working at universities within the United Kingdom. Mental health nurse academics collaborated to develop ideas and articulate arguments and perspectives which present a strong position on the requirement for specialist pre-registration mental health nurse education. The key themes explored are; a conflict of ideologies in nursing, no parity of esteem, physical health care needs to be contextualized, the unique nature of mental health nursing, ethical tensions and values conflict, implications for practice, necessary improvements overlooked and the dangers of honesty and academic 'freedom'. The paper concludes by asserting a strong position on the need for a change of direction away from genericism and calls on mental health nurses to rise from the ashes to advocate for a quality education necessary to ensure quality care delivery. The quality of mental health care provided by mental health nurses has many influences, yet the foundation offered through pre-registration education is one of the most valuable. If the education of mental health nurses does not attend to the distinct and unique role of the mental health nurse, standards of mental health care may diminish without assertive action from mental health nurses and allies.


Subject(s)
Education, Nursing, Baccalaureate , Psychiatric Nursing , Humans , Mental Health , United Kingdom , Health Education
4.
Elife ; 122023 02 27.
Article in English | MEDLINE | ID: mdl-36847334

ABSTRACT

Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks to directly predict a variety of protein functions - Enzyme Commission (EC) numbers and Gene Ontology (GO) terms - directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user's personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, please visit https://google-research.github.io/proteinfer/.


Subject(s)
Algorithms , Neural Networks, Computer , Proteins/genetics , Proteins/chemistry , Amino Acid Sequence , Software , Computational Biology/methods
5.
Nat Biotechnol ; 41(8): 1073-1074, 2023 Aug.
Article in English | MEDLINE | ID: mdl-36702894
6.
Methods Mol Biol ; 2586: 49-77, 2023.
Article in English | MEDLINE | ID: mdl-36705898

ABSTRACT

Here we detail the LandscapeFold secondary structure prediction algorithm and how it is used. The algorithm was previously described and tested in (Kimchi O et al., Biophys J 117(3):520-532, 2019), though it was not named there. The algorithm directly enumerates all possible secondary structures into which up to two RNA or single-stranded DNA sequences can fold. It uses a polymer physics model to estimate the configurational entropy of structures including complex pseudoknots. We detail each of these steps and ways in which the user can adjust the algorithm as desired. The code is available on the GitHub repository https://github.com/ofer-kimchi/LandscapeFold .


Subject(s)
Algorithms , RNA , Nucleic Acid Conformation , RNA/genetics , Entropy , DNA, Single-Stranded
7.
Nucleic Acids Res ; 51(D1): D753-D759, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36477304

ABSTRACT

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.


Subject(s)
Microbiota , Sequence Analysis , Genomics/methods , Metagenome , Metagenomics/methods , Microbiota/genetics , Software , Sequence Analysis/methods
8.
Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36350672

ABSTRACT

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.


Subject(s)
Databases, Protein , Humans , Amino Acid Sequence , Artificial Intelligence , Internet , Proteins/chemistry , Software
9.
Database (Oxford) ; 20222022 08 12.
Article in English | MEDLINE | ID: mdl-35961013

ABSTRACT

Over the last 25 years, biology has entered the genomic era and is becoming a science of 'big data'. Most interpretations of genomic analyses rely on accurate functional annotations of the proteins encoded by more than 500 000 genomes sequenced to date. By different estimates, only half the predicted sequenced proteins carry an accurate functional annotation, and this percentage varies drastically between different organismal lineages. Such a large gap in knowledge hampers all aspects of biological enterprise and, thereby, is standing in the way of genomic biology reaching its full potential. A brainstorming meeting to address this issue funded by the National Science Foundation was held during 3-4 February 2022. Bringing together data scientists, biocurators, computational biologists and experimentalists within the same venue allowed for a comprehensive assessment of the current state of functional annotations of protein families. Further, major issues that were obstructing the field were identified and discussed, which ultimately allowed for the proposal of solutions on how to move forward.


Subject(s)
Genomics , Proteins , Base Sequence , Computational Biology , Genome , Molecular Sequence Annotation
10.
Biophys J ; 121(16): 3023-3033, 2022 08 16.
Article in English | MEDLINE | ID: mdl-35859421

ABSTRACT

Collagen fibrils are the major constituents of the extracellular matrix, which provides structural support to vertebrate connective tissues. It is widely assumed that the superstructure of collagen fibrils is encoded in the primary sequences of the molecular building blocks. However, the interplay between large-scale architecture and small-scale molecular interactions makes the ab initio prediction of collagen structure challenging. Here, we propose a model that allows us to predict the periodic structure of collagen fibers and the axial offset between the molecules, purely on the basis of simple predictive rules for the interaction between amino acid residues. With our model, we identify the sequence-dependent collagen fiber geometries with the lowest free energy and validate the predicted geometries against the available experimental data. We propose a procedure for searching for optimal staggering distances. Finally, we build a classification algorithm and use it to scan 11 data sets of vertebrate fibrillar collagens, and predict the periodicity of the resulting assemblies. We analyzed the experimentally observed variance of the optimal stagger distances across species, and find that these distances, and the resulting fibrillar phenotypes, are evolutionary well preserved. Moreover, we observed that the energy minimum at the optimal stagger distance is broad in all cases, suggesting a further evolutionary adaptation designed to improve the assembly kinetics. Our periodicity predictions are not only in good agreement with the experimental data on collagen molecular staggering for all collagen types analyzed, but also for synthetic peptides. We argue that, with our model, it becomes possible to design tailor-made, periodic collagen structures, thereby enabling the design of novel biomimetic materials based on collagen-mimetic trimers.


Subject(s)
Biomimetic Materials , Collagen , Biomimetic Materials/chemistry , Collagen/metabolism , Extracellular Matrix/metabolism , Fibrillar Collagens , Peptides/chemistry
11.
Nat Biotechnol ; 40(6): 932-937, 2022 06.
Article in English | MEDLINE | ID: mdl-35190689

ABSTRACT

Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by >9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools.


Subject(s)
Deep Learning , Amino Acid Sequence , Databases, Protein , Humans , Molecular Sequence Annotation , Proteome/metabolism , Proteomics
12.
Cell Syst ; 12(11): 1019-1020, 2021 11 17.
Article in English | MEDLINE | ID: mdl-34793698

ABSTRACT

Machine-learning-guided protein design is rapidly emerging as a strategy to find high-fitness multi-mutant variants. In this issue of Cell Systems, Wittman et al. analyze the impact of design decisions for machine-learning-assisted directed evolution (MLDE) on its ability to navigate a fitness landscape and reliably find global optima.


Subject(s)
Machine Learning , Proteins
13.
Nat Biotechnol ; 39(6): 691-696, 2021 06.
Article in English | MEDLINE | ID: mdl-33574611

ABSTRACT

Modern experimental technologies can assay large numbers of biological sequences, but engineered protein libraries rarely exceed the sequence diversity of natural protein families. Machine learning (ML) models trained directly on experimental data without biophysical modeling provide one route to accessing the full potential diversity of engineered proteins. Here we apply deep learning to design highly diverse adeno-associated virus 2 (AAV2) capsid protein variants that remain viable for packaging of a DNA payload. Focusing on a 28-amino acid segment, we generated 201,426 variants of the AAV2 wild-type (WT) sequence yielding 110,689 viable engineered capsids, 57,348 of which surpass the average diversity of natural AAV serotype sequences, with 12-29 mutations across this region. Even when trained on limited data, deep neural network models accurately predict capsid viability across diverse variants. This approach unlocks vast areas of functional but previously unreachable sequence space, with many potential applications for the generation of improved viral vectors and protein therapeutics.


Subject(s)
Capsid Proteins/genetics , Dependovirus/genetics , Machine Learning , Genetic Vectors , HeLa Cells , Humans
14.
Nat Biotechnol ; 38(8): 989-999, 2020 08.
Article in English | MEDLINE | ID: mdl-32284585

ABSTRACT

A central challenge in expanding the genetic code of cells to incorporate noncanonical amino acids into proteins is the scalable discovery of aminoacyl-tRNA synthetase (aaRS)-tRNA pairs that are orthogonal in their aminoacylation specificity. Here we computationally identify candidate orthogonal tRNAs from millions of sequences and develop a rapid, scalable approach-named tRNA Extension (tREX)-to determine the in vivo aminoacylation status of tRNAs. Using tREX, we test 243 candidate tRNAs in Escherichia coli and identify 71 orthogonal tRNAs, covering 16 isoacceptor classes, and 23 functional orthogonal tRNA-cognate aaRS pairs. We discover five orthogonal pairs, including three highly active amber suppressors, and evolve new amino acid substrate specificities for two pairs. Finally, we use tREX to characterize a matrix of 64 orthogonal synthetase-orthogonal tRNA specificities. This work expands the number of orthogonal pairs available for genetic code expansion and provides a pipeline for the discovery of additional orthogonal pairs and a foundation for encoding the cellular synthesis of noncanonical biopolymers.


Subject(s)
Amino Acyl-tRNA Synthetases/metabolism , RNA, Transfer/metabolism , Amino Acid Sequence , Amino Acyl-tRNA Synthetases/genetics , Computer Simulation , Escherichia coli , Gene Expression Regulation, Bacterial , Green Fluorescent Proteins , Protein Binding , Substrate Specificity
15.
Sci Rep ; 10(1): 3397, 2020 02 25.
Article in English | MEDLINE | ID: mdl-32099005

ABSTRACT

Collagen fibrils are central to the molecular organization of the extracellular matrix (ECM) and to defining the cellular microenvironment. Glycation of collagen fibrils is known to impact on cell adhesion and migration in the context of cancer and in model studies, glycation of collagen molecules has been shown to affect the binding of other ECM components to collagen. Here we use TEM to show that ribose-5-phosphate (R5P) glycation of collagen fibrils - potentially important in the microenvironment of actively dividing cells, such as cancer cells - disrupts the longitudinal ordering of the molecules in collagen fibrils and, using KFM and FLiM, that R5P-glycated collagen fibrils have a more negative surface charge than unglycated fibrils. Altered molecular arrangement can be expected to impact on the accessibility of cell adhesion sites and altered fibril surface charge on the integrity of the extracellular matrix structure surrounding glycated collagen fibrils. Both effects are highly relevant for cell adhesion and migration within the tumour microenvironment.


Subject(s)
Collagen Type I/chemistry , Extracellular Matrix/chemistry , Ribosemonophosphates/chemistry , Animals , Collagen Type I/metabolism , Extracellular Matrix/metabolism , Glycosylation , Humans , Ribosemonophosphates/metabolism
16.
J Chem Inf Model ; 60(1): 56-62, 2020 01 27.
Article in English | MEDLINE | ID: mdl-31825609

ABSTRACT

The structured nature of chemical data means machine-learning models trained to predict protein-ligand binding risk overfitting the data, impairing their ability to generalize and make accurate predictions for novel candidate ligands. Data debiasing algorithms, which systematically partition the data to reduce bias and provide a more accurate metric of model performance, have the potential to address this issue. When models are trained using debiased data splits, the reward for simply memorizing the training data is reduced, suggesting that the ability of the model to make accurate predictions for novel candidate ligands will improve. To test this hypothesis, we use distance-based data splits to measure how well a model can generalize. We first confirm that models perform better for randomly split held-out sets than for distant held-out sets. We then debias the data and find, surprisingly, that debiasing typically reduces the ability of models to make accurate predictions for distant held-out test sets and that model performance measured after debiasing is not representative of the ability of a model to generalize. These results suggest that debiasing reduces the information available to a model, impairing its ability to generalize.


Subject(s)
Proteins/chemistry , Algorithms , Ligands , Models, Chemical , Protein Binding
17.
Brief Bioinform ; 21(5): 1549-1567, 2020 09 25.
Article in English | MEDLINE | ID: mdl-31626279

ABSTRACT

Antibodies are proteins that recognize the molecular surfaces of potentially noxious molecules to mount an adaptive immune response or, in the case of autoimmune diseases, molecules that are part of healthy cells and tissues. Due to their binding versatility, antibodies are currently the largest class of biotherapeutics, with five monoclonal antibodies ranked in the top 10 blockbuster drugs. Computational advances in protein modelling and design can have a tangible impact on antibody-based therapeutic development. Antibody-specific computational protocols currently benefit from an increasing volume of data provided by next generation sequencing and application to related drug modalities based on traditional antibodies, such as nanobodies. Here we present a structured overview of available databases, methods and emerging trends in computational antibody analysis and contextualize them towards the engineering of candidate antibody therapeutics.


Subject(s)
Antibodies, Monoclonal/chemistry , Antibodies, Monoclonal/immunology , Antibodies, Monoclonal/therapeutic use , Computational Biology/methods , Databases, Protein , Molecular Docking Simulation , Protein Conformation
18.
J Comput Biol ; 27(8): 1219-1231, 2020 08.
Article in English | MEDLINE | ID: mdl-31874057

ABSTRACT

In many application domains, neural networks are highly accurate and have been deployed at large scale. However, users often do not have good tools for understanding how these models arrive at their predictions. This has hindered adoption in fields such as the life and medical sciences, where researchers require that models base their decisions on underlying biological phenomena rather than peculiarities of the dataset. We propose a set of methods for critiquing deep learning models and demonstrate their application for protein family classification, a task for which high-accuracy models have considerable potential impact. Our methods extend the Sufficient Input Subsets (SIS) technique, which we use to identify subsets of features in each protein sequence that are alone sufficient for classification. Our suite of tools analyzes these subsets to shed light on the decision-making criteria employed by models trained on this task. These tools show that while deep models may perform classification for biologically relevant reasons, their behavior varies considerably across the choice of network architecture and parameter initialization. While the techniques that we develop are specific to the protein sequence classification task, the approach taken generalizes to a broad set of scientific contexts in which model interpretability is essential.


Subject(s)
Computational Biology , Models, Biological , Multigene Family/genetics , Proteins/classification , Deep Learning , Humans , Machine Learning , Neural Networks, Computer , Proteins/genetics
19.
Phys Rev Lett ; 123(23): 238102, 2019 Dec 06.
Article in English | MEDLINE | ID: mdl-31868483

ABSTRACT

Collagen consists of three peptides twisted together through a periodic array of hydrogen bonds. Here we use this as inspiration to find design rules for programmed specific interactions for self-assembling synthetic collagenlike triple helices, starting from disordered configurations. The assembly generically nucleates defects in the triple helix, the characteristics of which can be manipulated by spatially varying the enthalpy of helix formation. Defect formation slows assembly, evoking kinetic pathologies that have been observed to mutations in the primary collagen amino acid sequence. The controlled formation and interaction between defects gives a route for hierarchical self-assembly of bundles of twisted filaments.


Subject(s)
Collagen/chemistry , Models, Chemical , Amino Acid Sequence , Models, Molecular , Nanostructures/chemistry , Peptides/chemistry , Protein Conformation, alpha-Helical
20.
Biophys J ; 117(3): 520-532, 2019 08 06.
Article in English | MEDLINE | ID: mdl-31353036

ABSTRACT

The accurate prediction of RNA secondary structure from primary sequence has had enormous impact on research from the past 40 years. Although many algorithms are available to make these predictions, the inclusion of non-nested loops, termed pseudoknots, still poses challenges arising from two main factors: 1) no physical model exists to estimate the loop entropies of complex intramolecular pseudoknots, and 2) their NP-complete enumeration has impeded their study. Here, we address both challenges. First, we develop a polymer physics model that can address arbitrarily complex pseudoknots using only two parameters corresponding to concrete physical quantities-over an order of magnitude fewer than the sparsest state-of-the-art phenomenological methods. Second, by coupling this model to exhaustive enumeration of the set of possible structures, we compute the entire free energy landscape of secondary structures resulting from a primary RNA sequence. We demonstrate that for RNA structures of ∼80 nucleotides, with minimal heuristics, the complete enumeration of possible secondary structures can be accomplished quickly despite the NP-complete nature of the problem. We further show that despite our loop entropy model's parametric sparsity, it performs better than or on par with previously published methods in predicting both pseudoknotted and non-pseudoknotted structures on a benchmark data set of RNA structures of ≤80 nucleotides. We suggest ways in which the accuracy of the model can be further improved.


Subject(s)
Entropy , Nucleic Acid Conformation , Polymers/chemistry , RNA , Algorithms , RNA/chemistry , Thermodynamics
SELECTION OF CITATIONS
SEARCH DETAIL
...